1 Introduction

FastTrack is a (nearly) automated formant tracking tool that enables you to extract the formant contours quickly and easily. The tool functions based on a number of praat scripts and as a plug-in in Praat, so there is no need for you to code anything. There is also room for customising the scripts to make more individualised tool. FastTrack is developed by Santiago Barreda at UC Davies and more information can be found here: https://github.com/santiagobarreda/FastTrack

Also, the following paper explains the tool in more detail:

Barreda, S. (2021). Fast Track: fast (nearly) automatic formant-tracking using Praat. Linguistics Vanguard, 7(1), 20200051. https://doi.org/10.1515/lingvan-2020-0051

1.1 Workflow overview

In the workshop, I will demonstrate my typical workflow of acoustic (spectral) analysis using FastTrack. I usually follow these steps:

  1. Record audio data (well, quite obvious)

  2. Annotate audio files onto Praat Textgrid

  • I usually start this using forced alignment (e.g., Montreal Forced Aligner: MFA) and then refine segment boundaries manually.
  • I use MFA because it can annotate segments using ARPABET that FastTrack can recognise.
  • Helpful to create two tiers at least: word (orthography) and phone (ARPABET). You can manually annotate words first, and then MFA will create the phone tier for you.
  1. Define segments of interest
  • FastTrack starts to kick in here. By default, FastTrack looks for vowels as FastTrack is designed primarily for analysing vowels (or, to be more specific, vowel inherent spectral change: VISC). But the minimum requirement in using FastTrack is that formant structure is visible throughout each audio file to be analysed, and you can tweak the settings into analysing sonorant consonants (e.g., liquids, semi-vowels).
  1. Extract vowels (or vocalic portions of interest)
  • FastTrack estimates formants throughout each audio file. This means that it achieves the highest formant estimation accuracy when spectral structure is seen throughout each file.
  • FastTrack has Extract vowels function that enables you to extract the vocalic portions of your interest before analysis.
  1. Estimate formants using Track Folder
  • Once individual segments have been extracted, you can bulk-estimate formant frequencies for all audio files stored in the same directory.
  • Some parameters need to be specified, such as the maximum and minumum frequency of anlaysis windows, the number of formants to be extracted (3 or 4), the number of data points (i.e., `bins’) to be output etc.
  1. Visualise and run stats using R
  • This is the fun bit!

1.2 Workshop objectives

In this workshop, I will mainly explain and demonstrate steps 3-5 from above. If you would like to follow along, you can install FastTrack beforehand. A detailed step-by-step guide is available in Santiago’s Github repository with some video illustrations. See the wiki on his Github repository for the tutorial on installation (and many other things!)

1.3 Data

We are going to analyse vowel production from ``the North Wind and the Sun’’ passage produced by speakers of different L1 backgrounds. We will use the data from the ALLSTAR Corpus. The ALLSTAR Corpus contains a number of spontaneous and scripted speech that were produced by English speakers from different language backgrounds:

Bradlow, A. R. (n.d.) ALLSSTAR: Archive of L1 and L2 Scripted and Spontaneous Transcripts And Recordings. Retrieved from https://oscaar3.ling.northwestern.edu/ALLSSTARcentral/#!/recordings.

You can download a subset of the corpus data to work with in this workshop from here.

The data contains recordings of the North Wind and the Sun passage by 22 speakers from various L1 backgrounds: Chinese-Cantonese (n = 4), Chinese-Mandarin (n = 4), English (n = 4), Japanese (n = 2), Korean (n = 4), and Spanish (n = 4). In each language group, half the speakers are female and the other half male. The file name convention is: ALL_[speaker number]_[gender: F or M]_[L1: CCT, CMN, ENG, JPN, KOR, SPA]_[L2: ENG]_NWS. Each audio file is accompanied by an annotated TextGrid file, generated by Montreal Forced Aligner.

2 Let’s begin the analysis!

2.1 Step 1: Extracting vowels

We have recorded speech data from participants and/or obtained corpus data already. After some agony, we have managed to segment everything and are now ready to proceed onto acoustic analysis.

When using FastTrack, the first thing we need to do is to extract vocalic portions that we would like to analyse. Let’s extract vowels using FastTrack before submitting them to formant estimation.

FastTrack extracts segments specified in the spreadsheet vowelstoextract_default.csv. By default, the csv file lists vowels, but you can modify the list if interested in extracted other types of sounds (e.g., liquids, semi-vowels). You can find this by going to: FastTrack-master -> Fast Track -> dat.

  • When changing something here, I would strongly suggest that you keep the default file, too. FastTrack only recognises the spreadsheet when it’s named as vowelstoextract_default.csv, which means that you can just give a different name to the spreadsheets that you’d like to keep. For example, when I analysed liquids, I first copied the default file and renamed it into vowelstoextract_vowel.csv. I then modify the default file so that the list only contains /l/ and /r/.

2.1.1 Procedure

Here is a somewhat detailed workflow:

  1. Download data.zip and save it somewhere on your computer.

  2. FastTrack requires a certain repository structure, so let’s do this now. Specifically, we’ll need to save the audio and textgrid files in separate folder, named sounds and textgrids separately. Create new folders, give the appropriate names, and save the files in each folder.

  • It might also be useful to create an output folder at this stage, too. This is where the extracted files, which we will use for formant estimation later on, will be spitted out.
  1. Open Praat and throw a random file in the object window. This will trigger the FastTrack functions to appear in the menu section.

  2. Select Tools…, then Extract vowels with TextGrids.

  3. Once a window pops up, specify the following:

  • Sound folder:
    • Path to ``sounds’’ in the data folder containing .wav files.
  • TextGrid folder:
    • Path to ``textgrids’’ in the data folder containing .TextGrid files.
  • Output folder:
    • Path to the folder where you wish to save the outputs. You could specify an existing location.
  • Which tier contains segmentation information?:
    • Specify the tier in which phonemic transcription/segmentation has been performed. In the current example, the segmentation is done in Tier 2 so type 2.
  • Which tier contains word information?:
    • Specify the tier with words. Type 1 in this case.
  • Is stress marked on vowels?:
    • Tick the box if you wish to take stress into account. If you use MFA to segment the speech, stress is marked alongside each vowel. For example, you will find “AE1”, which means a TRAP vowel that bears the primary stress, or “AE2” the secondary stress, etc.
Vowel extraction setting window

Vowel extraction setting window

  1. Press OK and excute!

2.1.2 Output files

Let’s check what files have been created at this stage. Go to the output folder to check what it contains:

  • segmentation_information.csv: A detailed summary of the extraction process, including input and output (audio) files, labels, duration, previous and next adjacent segments and stress information.

  • file_information.csv: A brief summary of the correspondence between output files, labels and colours. Colours are relevant when visualising vowels using Praat.

  • sounds: Extracted audio files. You’ll notice that the file names now have some numbers added to the end (e.g., ALL_005_M_CMN_ENG_NWS_0002.wav), indicating the order of extraction from the original audio file.

2.2 Step 2: Tracking a folder

Having extracted vowels (or any vocalic segments), we’re now ready to move onto the fun bit: formant tracking! Here is what FastTrack does:

  • FastTrack does a lot of regressions and chooses the best analysis out of multiple candidates automatically. It can also return images of all candidates and the winners for visual inspection.

  • It estimates the formant frequencies at the multiple time points throughout the vowel duration.

  • The output is a csv file summarising the analysis, which can then be imported into R for tidy up, visualisation, statistics, etc.

2.2.1 Procedure

Formant estimation is based on the output folder from the extraction stage. Do not delete or move anything from the folder!

  1. Make sure that you know where the output folder is. This should contain at least: (1) file_information.csv, (2) segment_information.csv, and (3) a sounds folder containing a bunch of segmented audio files.

  2. Open Praat and throw a random file in the object window. This will trigger the FastTrack functions to appear in the menu section (if FastTrack is installed properly).

  3. Select Track folder….

  4. Specify the path to the output folder in the ``Folder’’ section. (Hint: this is not the path to the sounds folder!)

  5. Adjust parameters for your needs. This includes:

  • Lowest/highest analysis frequency: The range of upper limit of the frequency window that FastTrack seeks formants. FastTrack alters the ceiling of the analysis window in a number of steps to identify the most optimal formant estimation.

    • Note: It is recommended that we do formant tracking for female and male speakers separately due to anatomical differences such as vocal tract length (details here). Given this, we will conduct analysis for female speakers with the 5000-7000 Hz range and for male speakers 4500-6500 Hz range.
  • Number of steps: Basically the number of iteration of the upper limit adjustment. 24 here means that FastTrack adjusts the upper frequency limit in 24 steps from the lowest to highest analysis frequency that you have specified in the previous step.

  • Number of formants: Obviously how many formants you’d like to extract. This has an impact of the formant estimation accuracy, so explore a little if you don’t get a satisfactory analysis.

  • Make images comparing analysis/showing winners: You can choose whether you’d like to have an .png file for each analysis and each winner. I’d always tick the boxes here, but this depends on how much storage you have.

  • Also, do not forget to tick Show Progress: otherwise it’d look like the computer is frozen and it wouldn’t let you know how much time it takes to process everything.

  1. Hit OK and run it!
Setting window for formant estimation (left) and an example of comparison image (right)Setting window for formant estimation (left) and an example of comparison image (right)

Setting window for formant estimation (left) and an example of comparison image (right)

2.2.2 Output files

You’ll get quite a few output files from this stage. Let’s take a look at some of them that are most relevant here:

  • aggregated_data.csv: A spreadsheet summarising (1) input file, (2) duration, (3) formant measurement averaged into the pre-specified number of bins (e.g., 11) for each vowel analysed here. This can be found in the processed_data folder. (See the image below for a quick overview.)

  • winners.csv: A spreadsheet summarising which analysis step yields the most accurate formant tracking based on regression. Useful when you are not satisfied with the winner auto-selected by FastTrack and identify another analysis to be better.

  • images_comparison: A folder showing the results of the step-wise formant estimation. Useful when evaluating formant tracking quality relative to each of the analysis steps.

  • images_winner: A folder containing images for each `winning’ analysis for each sound file.

  • csvs: A folder containing initial formant tracking sampled at 2ms interval (before FastTrack bins them into a smaller number of data points) for each winning analysis. Useful for a finer-grained analysis.

An example of Aggregate_data.csv

An example of Aggregate_data.csv

3 Data processing and analysis using R

Acoustic analysis is done, hooray! Now let’s move onto the more fun part – data wrangling, visualiastion and analysis using R. I can think of two broad paths to data analysis, and I’ll explain them one by one below.

3.1 Based on aggregated_data.csv

3.1.1 Data processing

An easier way of data analysis is to use aggregated_data.csv. Here, let’s convert the file format in a more tidyverse-friendly manner and then try some plotting.

Data transformation here is based on the codes originally written by Dr Sam Kirkham (Lancaster University). We will first import relevant data sets: aggregated_data.csv and segmentation_info.csv.

We have separate data sets for female and male speakers, so we’ll import them separately and merge them at a later stage.

Female data:

# load packages
library(tidyverse)

# import data sets: female data
## aggregated_data.csv
df_aggr_f <- readr::read_csv("/Volumes/Samsung_T5/data/female/output/processed_data/aggregated_data.csv")
## "Aggregated_data" contains information about formant frequency values and duration, f0 etc. but lacks in other information.

## segmentation info
df_segment_info_f <- readr::read_csv("/Volumes/Samsung_T5/data/female/output/segmentation_information.csv")
## "segmentation information" supplement the "aggregated_data.csv" with information about the context of the extracted sounds, vowel duration, stress, comments, etc.

df_segment_info_f <- df_segment_info_f |> 
  dplyr::rename(file = outputfile)
## Rename the "outputfile" column to "file" so that it is compatible with the "aggregated_data.csv".

df_f <- merge(df_aggr_f, df_segment_info_f, by = "file", all = T)
## Merging the two csv files by the "file" column

df_f <- na.omit(df_f) # omitting NA

Male data:

# import data sets: female data
## aggregated_data.csv
df_aggr_m <- readr::read_csv("/Volumes/Samsung_T5/data/male/output/processed_data/aggregated_data.csv")
## "Aggregated_data" contains information about formant frequency values and duration, f0 etc. but lacks in other information.

## segmentation info
df_segment_info_m <- readr::read_csv("/Volumes/Samsung_T5/data/male/output/segmentation_information.csv")
## "segmentation information" supplement the "aggregated_data.csv" with information about the context of the extracted sounds, vowel duration, stress, comments, etc.

df_segment_info_m <- df_segment_info_m |> 
  dplyr::rename(file = outputfile)
## Rename the "outputfile" column to "file" so that it is compatible with the "aggregated_data.csv".

df_m <- merge(df_aggr_m, df_segment_info_m, by = "file", all = T)
## Merging the two csv files by the "file" column

df_m <- na.omit(df_m) # omitting NA

Let’s then merge the female and male data and make the data frame into a long data.

# combine female and male data
df <- rbind(df_f, df_m)

df_long <- df %>%
  tidyr::pivot_longer(contains(c("f1", "f2", "f3")), # add "f4" if you extract F4 as well 
               names_to = c("formant", "timepoint"), 
               names_pattern = "(f\\d)(\\d+)",
               values_to = "hz")

Then, we will add proportional time information here – we have extracted the formant frequencies that are summarised into 11 bins, meaning that we can express the temporal information from 0% to 100% with a 10% increment.

If you recall, the audio file names contain information about the speaker background (e.g., ALL_005_M_CMN_ENG_NWS). Let’s add them into the data frame, too.

df_long <- df_long |> 
  tidyr::spread(key = formant, value = hz) |> 
  dplyr::select(-duration.y) |> # drop one of the two duration columns
  dplyr::rename(
    duration = duration.x) |> # rename the duration column
  dplyr::mutate(
    timepoint = as.numeric(timepoint),  
    percent = (timepoint - 1) * 10, # adding proportional time
    speaker =
      str_sub(file, start = 5, end = 7), # speaker ID: three digits
    speaker = as.factor(speaker),
    gender = 
      str_sub(file, start = 9, end = 9), # gender: F or M
    L1 =
      str_sub(file, start = 11, end = 13), # L1: CMN, CCT ...
  )

# within-speaker normalisation
df_long <- df_long |> 
  dplyr::group_by(speaker) |> 
  dplyr::mutate(
    f1z = scale(f1),
    f2z = scale(f2),
    f3z = scale(f3)
  ) |> 
  dplyr::ungroup()

# check data
df_long |> 
  dplyr::group_by(gender, L1) |> 
  dplyr::summarise() |> 
  dplyr::ungroup()
## `summarise()` has grouped output by 'gender'. You can override using the
## `.groups` argument.
## # A tibble: 12 × 2
##    gender L1   
##    <chr>  <chr>
##  1 F      CCT  
##  2 F      CMN  
##  3 F      ENG  
##  4 F      JPN  
##  5 F      KOR  
##  6 F      SPA  
##  7 M      CCT  
##  8 M      CMN  
##  9 M      ENG  
## 10 M      JPN  
## 11 M      KOR  
## 12 M      SPA

The data looks good! On we go to visualisation!

3.1.2 Data visualisation

Let’s try some data visualisation. Having temporal information in a proportional manner is useful because you can extract formant frequencies at an arbitrary point in time during each vowel interval.

Let’s first try visualising monophthongs based on midpoint measurement. We don’t filter anything here as it’d be interesting to see some variability in the data. But as seen in the commented out portion, you could filter tokens out based on e.g., surrounding segments.

df_long_mono <- df_long |> 
  dplyr::filter(
    # !next_sound %in% c("R", "W", "Y"), # monophthongs followed by /r/, /w/, and /j/ were avoided
    # !previous_sound %in% c("R", "W", "Y"), # monophthongs preceded by /r/, /w/, and /j/ were avoided
    # !next_sound %in% c("L", "NG"), # monophthongs followed by /l/ and /ng/ were avoided
    percent == 50, # specifying vowel midpoint
    !(vowel %in% c("AW", "AY", "EY", "OW", "OY")) # monophthongs
  ) |> 
  dplyr::mutate(
    vowel_ipa =
      case_when(
        str_detect(vowel, "AA") ~ "ɑ",
        str_detect(vowel, "AE") ~ "æ",
        str_detect(vowel, "AH") ~ "ʌ",
        str_detect(vowel, "AO") ~ "ɔ",
        str_detect(vowel, "EH") ~ "ɛ",
        str_detect(vowel, "ER") ~ "ɝ",
        str_detect(vowel, "IH") ~ "ɪ",
        str_detect(vowel, "IY") ~ "i",
        str_detect(vowel, "UH") ~ "ʊ",
        str_detect(vowel, "UW") ~ "u",
        )
    ) # add IPA symbols for visualisation

We need mean formant values where we’ll put the IPA labels.

# Calculate vowel means
df_mean <- df_long_mono |> 
  dplyr::group_by(gender, L1, vowel, vowel_ipa) |> 
  dplyr::summarise(
    m_f1 = mean(f1z),
    m_f2 = mean(f2z)
  ) |> 
  dplyr::ungroup() |> 
  dplyr::mutate(
    L1 = case_when(
      L1 == "CCT" ~ "Cantonese",
      L1 == "CMN" ~ "Mandarin",
      L1 == "ENG" ~ "English",
      L1 == "KOR" ~ "Korean",
      L1 == "JPN" ~ "Japanese",
      L1 == "SPA" ~ "Spanish",
    ) # making L1 labels to be more readable
  ) 
## `summarise()` has grouped output by 'gender', 'L1', 'vowel'. You can override
## using the `.groups` argument.
# plot
df_long_mono |> 
  dplyr::mutate(
    L1 = case_when(
      L1 == "CCT" ~ "Cantonese",
      L1 == "CMN" ~ "Mandarin",
      L1 == "ENG" ~ "English",
      L1 == "KOR" ~ "Korean",
      L1 == "JPN" ~ "Japanese",
      L1 == "SPA" ~ "Spanish",
    ) # making L1 labels to be more readable
  ) |> 
  ggplot(aes(x = f2z, y = f1z, colour = vowel_ipa)) +
  geom_point(size = 1, alpha = 0.5, show.legend = FALSE) +
  geom_label(data = df_mean, aes(x = m_f2, y = m_f1, label = vowel_ipa, colour = vowel), show.legend = FALSE) +
  scale_x_reverse(position = "top") +
  scale_y_reverse(position = "right") +
  labs(x = "normalised F2\n", y = "normalised F1\n", title = "vowel midpoint") +
  facet_grid(gender ~ L1) +
  theme(axis.text = element_text(size = 8),
        axis.title = element_text(size = 15),
        strip.text.x = element_text(size = 15),
        strip.text.y = element_text(size = 15, angle = 0),
        plot.title = element_text(size = 20, hjust = 0, face = "bold")
  ) 

We can also plot temporal changes in formant frequency. For example, here is a comparison of F2 dynamics between L1 English and L1 Japanese speakers. Please feel free to explore any other L1 comparisons!

df_long |> 
  dplyr::filter(
    L1 %in% c("ENG", "JPN") # change for different L1 pairs
  ) |> 
  ggplot(aes(x = percent, y = f2z, colour = L1)) +
  geom_point(alpha = 0.05) +
  geom_path(aes(group = number), alpha = 0.05) +
  geom_smooth(aes(group = L1)) + # you could also add smooths
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
  labs(x = "proportional time", y = "normalised F2\n", title = "F2 dynamics") +
  facet_wrap( ~ vowel) +
  scale_colour_manual(values = alpha(c("brown4", "blue4"))) +
  theme(axis.text = element_text(size = 10),
        axis.title = element_text(size = 15),
        strip.text.x = element_text(size = 15),
        strip.text.y = element_text(size = 15, angle = 0),
        plot.title = element_text(size = 20, hjust = 0, face = "bold")
  ) 
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

3.1.3 Data analysis

Finally, we could try fitting some statistical models to investigate whether vowel realisations differ depending on the speaker’s L1 background.

We could fit ordinary linear-mixed effect models for the midpoint measurement.

library(lme4)
library(lmerTest)
library(emmeans)

# converting variables into factor and dropping empty levels
df_long_mono$vowel <- droplevels(as.factor(df_long_mono$vowel)) 
df_long_mono$vowel <- as.factor(df_long_mono$vowel)
df_long_mono$speaker <- as.factor(df_long_mono$speaker)
df_long_mono$L1 <- as.factor(df_long_mono$L1)

# run model -- random intercepts for speaker made the model unable to converge so we just have random intercepts for item (i.e., word)
m1 <- lme4::lmer(f2z ~ L1 + vowel + L1:vowel + (1|word), data = df_long_mono, REML = FALSE)

## check what optimiser would let the model converge
lme4::allFit(m1)
## bobyqa : [OK]
## Nelder_Mead : [OK]
## nlminbwrap : [OK]
## optimx.L-BFGS-B : [OK]
## nloptwrap.NLOPT_LN_NELDERMEAD : [OK]
## nloptwrap.NLOPT_LN_BOBYQA : [OK]
## original model:
## f2z ~ L1 + vowel + L1:vowel + (1 | word) 
## data:  df_long_mono 
## optimizers (6): bobyqa, Nelder_Mead, nlminbwrap, optimx.L-BFGS-B,nloptwrap.NLOPT_LN_NELDERME...
## differences in negative log-likelihoods:
## max= 1.54e-09 ; std dev= 7.03e-10
## model summary
summary(m1)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: f2z ~ L1 + vowel + L1:vowel + (1 | word)
##    Data: df_long_mono
## 
##      AIC      BIC   logLik deviance df.resid 
##   2727.1   3060.9  -1301.6   2603.1     1546 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.9017 -0.4755  0.0164  0.4468  6.1629 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  word     (Intercept) 0.06991  0.2644  
##  Residual             0.27707  0.5264  
## Number of obs: 1608, groups:  word, 50
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)   -0.118707   0.294515  -0.403
## L1CMN         -0.539820   0.355897  -1.517
## L1ENG         -0.581840   0.355897  -1.635
## L1JPN         -0.611211   0.442639  -1.381
## L1KOR         -0.217421   0.355897  -0.611
## L1SPA         -0.076447   0.320113  -0.239
## vowelAE        0.269955   0.314502   0.858
## vowelAH       -0.406756   0.307074  -1.325
## vowelAO       -0.948149   0.306740  -3.091
## vowelEH        0.760033   0.334712   2.271
## vowelER       -0.154441   0.410909  -0.376
## vowelIH        1.397168   0.318119   4.392
## vowelIY        1.663599   0.337066   4.936
## vowelUH       -0.200448   0.364873  -0.549
## vowelUW       -0.434380   0.337316  -1.288
## L1CMN:vowelAE  0.586413   0.369831   1.586
## L1ENG:vowelAE  0.673613   0.372788   1.807
## L1JPN:vowelAE  0.463102   0.458309   1.010
## L1KOR:vowelAE  0.127476   0.369314   0.345
## L1SPA:vowelAE -0.201238   0.335845  -0.599
## L1CMN:vowelAH  0.546055   0.371211   1.471
## L1ENG:vowelAH  0.513164   0.372246   1.379
## L1JPN:vowelAH  0.599221   0.460871   1.300
## L1KOR:vowelAH  0.064547   0.371572   0.174
## L1SPA:vowelAH  0.101444   0.334452   0.303
## L1CMN:vowelAO  0.646956   0.368444   1.756
## L1ENG:vowelAO  0.655929   0.368808   1.779
## L1JPN:vowelAO  0.475693   0.457285   1.040
## L1KOR:vowelAO  0.254230   0.368310   0.690
## L1SPA:vowelAO  0.591621   0.334718   1.768
## L1CMN:vowelEH  0.375032   0.394024   0.952
## L1ENG:vowelEH  0.211821   0.391358   0.541
## L1JPN:vowelEH  0.571296   0.491661   1.162
## L1KOR:vowelEH -0.254598   0.393872  -0.646
## L1SPA:vowelEH  0.031268   0.362025   0.086
## L1CMN:vowelER  1.085664   0.478410   2.269
## L1ENG:vowelER  0.699930   0.495980   1.411
## L1JPN:vowelER  0.638206   0.558879   1.142
## L1KOR:vowelER  0.298354   0.478410   0.624
## L1SPA:vowelER  0.369148   0.429250   0.860
## L1CMN:vowelIH  0.364996   0.372332   0.980
## L1ENG:vowelIH -0.056797   0.377103  -0.151
## L1JPN:vowelIH  0.488462   0.460929   1.060
## L1KOR:vowelIH  0.005048   0.373606   0.014
## L1SPA:vowelIH -0.291504   0.337040  -0.865
## L1CMN:vowelIY  0.517614   0.390543   1.325
## L1ENG:vowelIY  1.120387   0.390700   2.868
## L1JPN:vowelIY  0.767789   0.483504   1.588
## L1KOR:vowelIY  0.193151   0.389763   0.496
## L1SPA:vowelIY -0.324448   0.358409  -0.905
## L1CMN:vowelUH  0.855794   0.415741   2.058
## L1ENG:vowelUH  0.825257   0.415741   1.985
## L1JPN:vowelUH  1.020869   0.524152   1.948
## L1KOR:vowelUH  0.723502   0.421344   1.717
## L1SPA:vowelUH  0.027879   0.385552   0.072
## L1CMN:vowelUW  0.604674   0.389030   1.554
## L1ENG:vowelUW  1.486826   0.392539   3.788
## L1JPN:vowelUW  0.791417   0.486262   1.628
## L1KOR:vowelUW  0.546013   0.390513   1.398
## L1SPA:vowelUW  0.080387   0.358163   0.224
# significance testing
## nested model for the interaction
m2 <- lme4::lmer(f2z ~ L1 + vowel + (1|word), data = df_long_mono, REML = FALSE)

## model comparison: full model significantly improves the model fit
anova(m1, m2, test = "Chisq")
## Data: df_long_mono
## Models:
## m2: f2z ~ L1 + vowel + (1 | word)
## m1: f2z ~ L1 + vowel + L1:vowel + (1 | word)
##    npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## m2   17 2823.8 2915.3 -1394.9   2789.8                         
## m1   62 2727.2 3060.9 -1301.6   2603.2 186.65 45  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## post-hoc anlaysis
emmeans::emmeans(m1, pairwise ~ L1 | vowel)
## $emmeans
## vowel = AA:
##  L1    emmean    SE  df lower.CL  upper.CL
##  CCT -0.11871 0.308 572  -0.7230  0.485606
##  CMN -0.65853 0.357 499  -1.3592  0.042161
##  ENG -0.70055 0.357 499  -1.4012  0.000141
##  JPN -0.72992 0.446 936  -1.6050  0.145165
##  KOR -0.33613 0.357 499  -1.0368  0.364561
##  SPA -0.19515 0.275 628  -0.7350  0.344657
## 
## vowel = AE:
##  L1    emmean    SE  df lower.CL  upper.CL
##  CCT  0.15125 0.118 168  -0.0821  0.384641
##  CMN  0.19784 0.121 178  -0.0401  0.435788
##  ENG  0.24302 0.131 236  -0.0148  0.500819
##  JPN  0.00314 0.137 295  -0.2657  0.272000
##  KOR  0.06130 0.119 169  -0.1729  0.295551
##  SPA -0.12644 0.121 183  -0.3652  0.112282
## 
## vowel = AH:
##  L1    emmean    SE  df lower.CL  upper.CL
##  CCT -0.52546 0.119 201  -0.7599 -0.291044
##  CMN -0.51923 0.120 210  -0.7548 -0.283651
##  ENG -0.59414 0.126 235  -0.8416 -0.346727
##  JPN -0.53745 0.143 388  -0.8178 -0.257151
##  KOR -0.67834 0.121 219  -0.9159 -0.440725
##  SPA -0.50047 0.113 167  -0.7232 -0.277713
## 
## vowel = AO:
##  L1    emmean    SE  df lower.CL  upper.CL
##  CCT -1.06686 0.114 181  -1.2912 -0.842552
##  CMN -0.95972 0.116 190  -1.1878 -0.731640
##  ENG -0.99277 0.119 193  -1.2279 -0.757667
##  JPN -1.20238 0.131 327  -1.4594 -0.945388
##  KOR -1.03005 0.114 183  -1.2559 -0.804173
##  SPA -0.55168 0.118 193  -0.7854 -0.317976
## 
## vowel = EH:
##  L1    emmean    SE  df lower.CL  upper.CL
##  CCT  0.64133 0.167 492   0.3141  0.968554
##  CMN  0.47654 0.161 487   0.1599  0.793137
##  ENG  0.27131 0.146 429  -0.0158  0.558386
##  JPN  0.60141 0.211 963   0.1878  1.015062
##  KOR  0.16931 0.162 481  -0.1490  0.487662
##  SPA  0.59615 0.161 487   0.2795  0.912746
## 
## vowel = ER:
##  L1    emmean    SE  df lower.CL  upper.CL
##  CCT -0.27315 0.303 220  -0.8701  0.323801
##  CMN  0.27270 0.322 274  -0.3613  0.906648
##  ENG -0.15506 0.352 360  -0.8481  0.537977
##  JPN -0.24615 0.338 342  -0.9119  0.419582
##  KOR -0.19222 0.322 274  -0.8262  0.441738
##  SPA  0.01955 0.281 162  -0.5346  0.573660
## 
## vowel = IH:
##  L1    emmean    SE  df lower.CL  upper.CL
##  CCT  1.27846 0.128 163   1.0249  1.532066
##  CMN  1.10364 0.129 172   0.8493  1.357934
##  ENG  0.63982 0.144 248   0.3561  0.923541
##  JPN  1.15571 0.145 295   0.8701  1.441349
##  KOR  1.06609 0.134 195   0.8022  1.329939
##  SPA  0.91051 0.126 154   0.6622  1.158847
## 
## vowel = IY:
##  L1    emmean    SE  df lower.CL  upper.CL
##  CCT  1.54489 0.175 156   1.1999  1.889922
##  CMN  1.52269 0.177 163   1.1740  1.871357
##  ENG  2.08344 0.178 166   1.7327  2.434154
##  JPN  1.70147 0.209 320   1.2903  2.112613
##  KOR  1.52062 0.175 157   1.1748  1.866398
##  SPA  1.14400 0.178 166   0.7933  1.494711
## 
## vowel = UH:
##  L1    emmean    SE  df lower.CL  upper.CL
##  CCT -0.31916 0.229 168  -0.7715  0.133146
##  CMN -0.00318 0.229 168  -0.4555  0.449120
##  ENG -0.07574 0.229 168  -0.5280  0.376563
##  JPN  0.09050 0.294 438  -0.4867  0.667713
##  KOR  0.18693 0.239 202  -0.2853  0.659109
##  SPA -0.36772 0.229 168  -0.8200  0.084577
## 
## vowel = UW:
##  L1    emmean    SE  df lower.CL  upper.CL
##  CCT -0.55309 0.175 156  -0.8992 -0.206984
##  CMN -0.48823 0.173 149  -0.8300 -0.146423
##  ENG  0.35190 0.184 184  -0.0102  0.714004
##  JPN -0.37288 0.217 358  -0.8000  0.054278
##  KOR -0.22450 0.178 165  -0.5758  0.126841
##  SPA -0.54915 0.177 164  -0.8988 -0.199455
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## 
## $contrasts
## vowel = AA:
##  contrast  estimate     SE   df t.ratio p.value
##  CCT - CMN  0.53982 0.3623 1640   1.490  0.6708
##  CCT - ENG  0.58184 0.3623 1640   1.606  0.5948
##  CCT - JPN  0.61121 0.4505 1631   1.357  0.7528
##  CCT - KOR  0.21742 0.3623 1640   0.600  0.9910
##  CCT - SPA  0.07645 0.3257 1628   0.235  0.9999
##  CMN - ENG  0.04202 0.3785 1610   0.111  1.0000
##  CMN - JPN  0.07139 0.4636 1610   0.154  1.0000
##  CMN - KOR -0.32240 0.3785 1610  -0.852  0.9576
##  CMN - SPA -0.46337 0.3548 1670  -1.306  0.7817
##  ENG - JPN  0.02937 0.4636 1610   0.063  1.0000
##  ENG - KOR -0.36442 0.3785 1610  -0.963  0.9296
##  ENG - SPA -0.50539 0.3548 1670  -1.425  0.7120
##  JPN - KOR -0.39379 0.4636 1610  -0.849  0.9581
##  JPN - SPA -0.53476 0.4444 1658  -1.203  0.8354
##  KOR - SPA -0.14097 0.3548 1670  -0.397  0.9987
## 
## vowel = AE:
##  contrast  estimate     SE   df t.ratio p.value
##  CCT - CMN -0.04659 0.1023 1611  -0.456  0.9975
##  CCT - ENG -0.09177 0.1129 1623  -0.813  0.9652
##  CCT - JPN  0.14811 0.1208 1611   1.226  0.8244
##  CCT - KOR  0.08994 0.1003 1611   0.897  0.9474
##  CCT - SPA  0.27769 0.1033 1611   2.688  0.0782
##  CMN - ENG -0.04518 0.1141 1620  -0.396  0.9987
##  CMN - JPN  0.19470 0.1224 1611   1.591  0.6046
##  CMN - KOR  0.13654 0.1022 1611   1.335  0.7653
##  CMN - SPA  0.32428 0.1051 1611   3.085  0.0253
##  ENG - JPN  0.23988 0.1313 1619   1.827  0.4484
##  ENG - KOR  0.18172 0.1125 1621   1.615  0.5889
##  ENG - SPA  0.36946 0.1149 1619   3.214  0.0168
##  JPN - KOR -0.05816 0.1208 1611  -0.481  0.9968
##  JPN - SPA  0.12958 0.1234 1612   1.050  0.9008
##  KOR - SPA  0.18774 0.1033 1612   1.817  0.4552
## 
## vowel = AH:
##  contrast  estimate     SE   df t.ratio p.value
##  CCT - CMN -0.00624 0.1071 1611  -0.058  1.0000
##  CCT - ENG  0.06868 0.1114 1616   0.617  0.9899
##  CCT - JPN  0.01199 0.1308 1611   0.092  1.0000
##  CCT - KOR  0.15287 0.1084 1612   1.410  0.7209
##  CCT - SPA -0.02500 0.0988 1617  -0.253  0.9999
##  CMN - ENG  0.07491 0.1125 1617   0.666  0.9856
##  CMN - JPN  0.01823 0.1318 1613   0.138  1.0000
##  CMN - KOR  0.15911 0.1094 1611   1.455  0.6933
##  CMN - SPA -0.01876 0.0999 1615  -0.188  1.0000
##  ENG - JPN -0.05669 0.1349 1613  -0.420  0.9983
##  ENG - KOR  0.08420 0.1135 1616   0.742  0.9767
##  ENG - SPA -0.09367 0.1041 1620  -0.900  0.9466
##  JPN - KOR  0.14088 0.1329 1614   1.060  0.8970
##  JPN - SPA -0.03699 0.1249 1617  -0.296  0.9997
##  KOR - SPA -0.17787 0.1012 1614  -1.758  0.4932
## 
## vowel = AO:
##  contrast  estimate     SE   df t.ratio p.value
##  CCT - CMN -0.10714 0.0976 1615  -1.098  0.8823
##  CCT - ENG -0.07409 0.0992 1621  -0.747  0.9760
##  CCT - JPN  0.13552 0.1166 1611   1.163  0.8545
##  CCT - KOR -0.03681 0.0966 1611  -0.381  0.9990
##  CCT - SPA -0.51517 0.0990 1617  -5.202  <.0001
##  CMN - ENG  0.03305 0.1000 1614   0.330  0.9995
##  CMN - JPN  0.24265 0.1179 1616   2.058  0.3100
##  CMN - KOR  0.07033 0.0979 1613   0.718  0.9797
##  CMN - SPA -0.40804 0.0999 1612  -4.083  0.0007
##  ENG - JPN  0.20961 0.1194 1622   1.756  0.4947
##  ENG - KOR  0.03728 0.0994 1618   0.375  0.9990
##  ENG - SPA -0.44108 0.1012 1611  -4.359  0.0002
##  JPN - KOR -0.17233 0.1169 1612  -1.474  0.6812
##  JPN - SPA -0.65069 0.1192 1619  -5.460  <.0001
##  KOR - SPA -0.47836 0.0993 1615  -4.817  <.0001
## 
## vowel = EH:
##  contrast  estimate     SE   df t.ratio p.value
##  CCT - CMN  0.16479 0.1720 1617   0.958  0.9309
##  CCT - ENG  0.37002 0.1657 1643   2.233  0.2233
##  CCT - JPN  0.03992 0.2177 1619   0.183  1.0000
##  CCT - KOR  0.47202 0.1716 1612   2.751  0.0663
##  CCT - SPA  0.04518 0.1720 1617   0.263  0.9998
##  CMN - ENG  0.20523 0.1618 1625   1.269  0.8021
##  CMN - JPN -0.12487 0.2153 1613  -0.580  0.9923
##  CMN - KOR  0.30723 0.1695 1612   1.813  0.4576
##  CMN - SPA -0.11961 0.1693 1610  -0.707  0.9812
##  ENG - JPN -0.33010 0.2098 1624  -1.574  0.6162
##  ENG - KOR  0.10200 0.1624 1633   0.628  0.9890
##  ENG - SPA -0.32484 0.1618 1625  -2.008  0.3379
##  JPN - KOR  0.43210 0.2157 1617   2.003  0.3409
##  JPN - SPA  0.00526 0.2153 1613   0.024  1.0000
##  KOR - SPA -0.42684 0.1695 1612  -2.519  0.1192
## 
## vowel = ER:
##  contrast  estimate     SE   df t.ratio p.value
##  CCT - CMN -0.54584 0.3252 1616  -1.678  0.5463
##  CCT - ENG -0.11809 0.3517 1638  -0.336  0.9994
##  CCT - JPN -0.02699 0.3471 1618  -0.078  1.0000
##  CCT - KOR -0.08093 0.3252 1616  -0.249  0.9999
##  CCT - SPA -0.29270 0.2909 1621  -1.006  0.9161
##  CMN - ENG  0.42775 0.3612 1621   1.184  0.8446
##  CMN - JPN  0.51885 0.3639 1632   1.426  0.7113
##  CMN - KOR  0.46491 0.3386 1610   1.373  0.7432
##  CMN - SPA  0.25314 0.3108 1639   0.814  0.9649
##  ENG - JPN  0.09110 0.3910 1655   0.233  0.9999
##  ENG - KOR  0.03716 0.3612 1621   0.103  1.0000
##  ENG - SPA -0.17461 0.3422 1663  -0.510  0.9958
##  JPN - KOR -0.05394 0.3639 1632  -0.148  1.0000
##  JPN - SPA -0.26571 0.3278 1610  -0.811  0.9657
##  KOR - SPA -0.21177 0.3108 1639  -0.681  0.9840
## 
## vowel = IH:
##  contrast  estimate     SE   df t.ratio p.value
##  CCT - CMN  0.17482 0.1113 1613   1.571  0.6178
##  CCT - ENG  0.63864 0.1269 1624   5.034  <.0001
##  CCT - JPN  0.12275 0.1308 1616   0.939  0.9364
##  CCT - KOR  0.21237 0.1156 1612   1.837  0.4420
##  CCT - SPA  0.36795 0.1073 1612   3.430  0.0081
##  CMN - ENG  0.46381 0.1279 1622   3.626  0.0040
##  CMN - JPN -0.05207 0.1322 1613  -0.394  0.9988
##  CMN - KOR  0.03755 0.1174 1612   0.320  0.9996
##  CMN - SPA  0.19313 0.1092 1612   1.769  0.4861
##  ENG - JPN -0.51589 0.1463 1629  -3.527  0.0058
##  ENG - KOR -0.42626 0.1314 1619  -3.245  0.0152
##  ENG - SPA -0.27069 0.1256 1628  -2.156  0.2594
##  JPN - KOR  0.08962 0.1361 1616   0.658  0.9863
##  JPN - SPA  0.24520 0.1288 1613   1.904  0.4000
##  KOR - SPA  0.15558 0.1139 1613   1.366  0.7472
## 
## vowel = IY:
##  contrast  estimate     SE   df t.ratio p.value
##  CCT - CMN  0.02221 0.1636 1613   0.136  1.0000
##  CCT - ENG -0.53855 0.1640 1618  -3.284  0.0133
##  CCT - JPN -0.15658 0.1979 1612  -0.791  0.9690
##  CCT - KOR  0.02427 0.1616 1613   0.150  1.0000
##  CCT - SPA  0.40090 0.1640 1618   2.445  0.1416
##  CMN - ENG -0.56075 0.1654 1612  -3.390  0.0093
##  CMN - JPN -0.17878 0.1993 1610  -0.897  0.9472
##  CMN - KOR  0.00206 0.1633 1610   0.013  1.0000
##  CMN - SPA  0.37869 0.1654 1612   2.290  0.1988
##  ENG - JPN  0.38197 0.1994 1611   1.916  0.3926
##  ENG - KOR  0.56282 0.1635 1612   3.443  0.0078
##  ENG - SPA  0.93944 0.1652 1610   5.687  <.0001
##  JPN - KOR  0.18085 0.1977 1610   0.915  0.9428
##  JPN - SPA  0.55747 0.1994 1611   2.796  0.0585
##  KOR - SPA  0.37663 0.1635 1612   2.304  0.1929
## 
## vowel = UH:
##  contrast  estimate     SE   df t.ratio p.value
##  CCT - CMN -0.31597 0.2185 1610  -1.446  0.6988
##  CCT - ENG -0.24342 0.2185 1610  -1.114  0.8758
##  CCT - JPN -0.40966 0.2855 1613  -1.435  0.7057
##  CCT - KOR -0.50608 0.2294 1611  -2.206  0.2353
##  CCT - SPA  0.04857 0.2185 1610   0.222  0.9999
##  CMN - ENG  0.07256 0.2185 1610   0.332  0.9995
##  CMN - JPN -0.09368 0.2855 1613  -0.328  0.9995
##  CMN - KOR -0.19011 0.2294 1611  -0.829  0.9622
##  CMN - SPA  0.36454 0.2185 1610   1.668  0.5532
##  ENG - JPN -0.16624 0.2855 1613  -0.582  0.9922
##  ENG - KOR -0.26266 0.2294 1611  -1.145  0.8624
##  ENG - SPA  0.29199 0.2185 1610   1.336  0.7649
##  JPN - KOR -0.09642 0.2936 1612  -0.328  0.9995
##  JPN - SPA  0.45823 0.2855 1613   1.605  0.5954
##  KOR - SPA  0.55465 0.2294 1611   2.418  0.1505
## 
## vowel = UW:
##  contrast  estimate     SE   df t.ratio p.value
##  CCT - CMN -0.06485 0.1598 1612  -0.406  0.9986
##  CCT - ENG -0.90499 0.1685 1620  -5.372  <.0001
##  CCT - JPN -0.18021 0.2048 1617  -0.880  0.9513
##  CCT - KOR -0.32859 0.1635 1613  -2.010  0.3370
##  CCT - SPA -0.00394 0.1634 1611  -0.024  1.0000
##  CMN - ENG -0.84013 0.1676 1629  -5.014  <.0001
##  CMN - JPN -0.11535 0.2040 1623  -0.565  0.9932
##  CMN - KOR -0.26374 0.1622 1619  -1.626  0.5814
##  CMN - SPA  0.06091 0.1618 1612   0.377  0.9990
##  ENG - JPN  0.72478 0.2092 1611   3.465  0.0072
##  ENG - KOR  0.57639 0.1698 1613   3.395  0.0092
##  ENG - SPA  0.90105 0.1705 1622   5.286  <.0001
##  JPN - KOR -0.14839 0.2058 1612  -0.721  0.9794
##  JPN - SPA  0.17627 0.2064 1618   0.854  0.9571
##  KOR - SPA  0.32465 0.1655 1614   1.962  0.3649
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 6 estimates

3.2 Based on information from the csv folder

Another way of spectral analysis would be to use FastTrack’s initial sampling at every 2ms. This information is stored in the csv folder.

3.2.1 Data processing

Let’s import all .csv files stored in the csv folder by running the loop below. Again, we’ll import female and male data separately and merge them later.

Female data:

## loading data
# index csv files in the directory
file_list <- list.files("/Volumes/Samsung_T5/data/female/output/csvs", pattern = "*.csv", full.names = TRUE)

# create an empty list to store data
data_list <- list()

for(i in seq_along(file_list)){
  current_data <- read.csv(file_list[i], header = TRUE)
  
  # Add a new column with the filename
  current_data$filename <- basename(file_list[i])
  
  data_list[[i]] <- current_data
}

# bind all data from the list into a data frame
dat_f <- dplyr::bind_rows(data_list) |> 
  dplyr::relocate(filename)

# View the result
head(dat_f)
##                         filename  time    f1    b1     f2    b2     f3    b3
## 1 ALL_011_F_CMN_ENG_NWS_0001.csv 0.026 542.5 261.8 1312.4 218.7 2492.4 244.7
## 2 ALL_011_F_CMN_ENG_NWS_0001.csv 0.028 544.2 253.5 1308.2 196.9 2479.9 237.3
## 3 ALL_011_F_CMN_ENG_NWS_0001.csv 0.030 547.4 248.3 1306.2 180.4 2469.7 225.5
## 4 ALL_011_F_CMN_ENG_NWS_0001.csv 0.032 552.1 245.4 1305.6 167.4 2461.2 209.9
## 5 ALL_011_F_CMN_ENG_NWS_0001.csv 0.034 557.7 244.4 1305.9 157.0 2453.8 193.9
## 6 ALL_011_F_CMN_ENG_NWS_0001.csv 0.036 564.8 245.5 1306.9 148.5 2447.1 180.5
##     f1p    f2p    f3p    f0 intensity harmonicity
## 1 543.1 1301.4 2480.4 270.2      69.3        30.9
## 2 545.2 1302.8 2477.2 270.2      69.4        31.0
## 3 548.7 1304.9 2472.2 270.2      69.4        31.0
## 4 553.2 1307.5 2465.8 270.3      69.4        31.1
## 5 558.7 1310.1 2458.4 270.3      69.4        31.1
## 6 564.8 1312.5 2450.5 270.3      69.4        31.1

Male data:

## loading data
# index csv files in the directory
file_list <- list.files("/Volumes/Samsung_T5/data/male/output/csvs", pattern = "*.csv", full.names = TRUE)

# create an empty list to store data
data_list <- list()

for(i in seq_along(file_list)){
  current_data <- read.csv(file_list[i], header = TRUE)
  
  # Add a new column with the filename
  current_data$filename <- basename(file_list[i])
  
  data_list[[i]] <- current_data
}

# bind all data from the list into a data frame
dat_m <- dplyr::bind_rows(data_list) |> 
  dplyr::relocate(filename)

# View the result
head(dat_m)
##                         filename  time    f1    b1     f2    b2     f3    b3
## 1 ALL_005_M_CMN_ENG_NWS_0001.csv 0.026 549.9 163.9 1491.2 136.0 2742.7 209.1
## 2 ALL_005_M_CMN_ENG_NWS_0001.csv 0.028 555.1 182.4 1472.8 141.1 2746.2 197.4
## 3 ALL_005_M_CMN_ENG_NWS_0001.csv 0.030 563.3 197.4 1455.9 147.3 2754.1 170.8
## 4 ALL_005_M_CMN_ENG_NWS_0001.csv 0.032 569.5 208.0 1439.2 156.5 2762.6 146.1
## 5 ALL_005_M_CMN_ENG_NWS_0001.csv 0.034 573.0 214.2 1421.6 159.1 2768.9 130.2
## 6 ALL_005_M_CMN_ENG_NWS_0001.csv 0.036 576.9 213.2 1409.1 143.7 2769.9 126.7
##     f1p    f2p    f3p    f0 intensity harmonicity
## 1 577.6 1458.8 2761.9 175.6      71.1        16.0
## 2 577.2 1457.0 2759.7 176.1      70.9        16.1
## 3 576.7 1454.1 2756.0 176.7      70.8        16.1
## 4 576.0 1450.1 2751.0 177.3      70.7        15.9
## 5 575.4 1444.9 2745.0 178.0      70.6        15.8
## 6 574.8 1438.6 2737.9 178.7      70.6        15.7

And let’s add some relevant information.

# merge female and male data
dat <- rbind(dat_f, dat_m)

# adding speaker, L1, gender etc from the file name
dat <- dat |> 
  dplyr::mutate(
    speaker =
      str_sub(filename, start = 5, end = 7), # speaker ID: three digits
    speaker = as.factor(speaker),
    gender = 
      str_sub(filename, start = 9, end = 9), # gender: F or M
    L1 =
      str_sub(filename, start = 11, end = 13), # L1: CMN, CCT ...
  )

# adding proportional time
dat <- dat |> 
  dplyr::group_by(filename) |> 
  dplyr::mutate(
    duration = max(time) - min(time),
    percent = (time - min(time)) / duration * 100 # make sure percent starts at 0 and ends at 100
  ) |> 
  dplyr::ungroup() |> 
  dplyr::relocate(filename, time, percent)

# within-speaker normalise formant
dat <- dat |> 
  dplyr::group_by(speaker) |> 
  dplyr::mutate(
    f1z = scale(f1),
    f2z = scale(f2),
    f3z = scale(f3)
  ) |> 
  dplyr::ungroup()

We also need to combine vowel information. This is where file_information.csv can be useful as it shows the correspondence between the filename and vowel.

# import file_information.csv
## female
df_file_f <- readr::read_csv("/Volumes/Samsung_T5/data/female/output/file_information.csv")

## male
df_file_m <- readr::read_csv("/Volumes/Samsung_T5/data/male/output/file_information.csv")

## merge
df_file <- rbind(df_file_f, df_file_m)

# create a common key to merge two data sets
## omit the extention from the "filename" column from dat and call it "file"
dat <- dat |> 
  dplyr::mutate(
    file = str_sub(filename, start = 1, end = -5)
  ) |> 
  dplyr::relocate(file)

## same for df_file
df_file <- df_file |> 
  dplyr::mutate(
    file = str_sub(file, start = 1, end = -5)
  ) |> 
  dplyr::relocate(file)

# join df_file and dat with the "file" information
dat <- dplyr::left_join(dat, df_file, by = "file") |> 
  dplyr::rename(
    vowel = label
  )

3.2.2 Data visualisation

Compared to the dynamic visualisation based on the aggregate_data.csv, you can see that we now have much finer temporal resolution from the number of data points! Again, here is the time-varying changes in F2.

dat |> 
  dplyr::filter(
    L1 %in% c("ENG", "JPN") # change for different L1 pairs
    ) |> 
  ggplot(aes(x = percent, y = f2z, colour = L1)) +
  geom_point(alpha = 0.05) +
  geom_path(aes(group = number), alpha = 0.05) +
  geom_smooth(aes(group = L1), colour = "white", linewidth = 1.2) + 
  geom_smooth(aes(group = L1)) + # you could also add smooths
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
  labs(x = "proportional time", y = "normalised F2\n", title = "F2 dynamics") +
  facet_wrap( ~ vowel) +
  scale_colour_manual(values = alpha(c("brown4", "blue4"))) +
  theme(axis.text = element_text(size = 10),
        axis.title = element_text(size = 15),
        strip.text.x = element_text(size = 15),
        strip.text.y = element_text(size = 15, angle = 0),
        plot.title = element_text(size = 20, hjust = 0, face = "bold")
  )  
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

4 Improving analysis

We’re now familiar with the overall workflow of acoustic analysis using FastTrack. Hooray! FastTrack is very efficient in analysing a large number of vowel tokens. In the data set above, we had a total of 1,964 tokens with the breakdown shown below:

# total number of tokens
dat |> 
  dplyr::group_by(file, L1) |> 
  dplyr::filter(
    percent == "0" # to make sure we only count one data point per file
  ) |> 
  dplyr::ungroup() |> 
  dplyr::count() 
## # A tibble: 1 × 1
##       n
##   <int>
## 1  1964
# by L1
dat |> 
  dplyr::group_by(file, L1) |> 
  dplyr::filter(
    percent == "0" # to make sure we only count one data point per file
  ) |> 
  dplyr::ungroup() |> 
  dplyr::group_by(L1) |> 
  dplyr::count() |> 
  dplyr::ungroup()
## # A tibble: 6 × 2
##   L1        n
##   <chr> <int>
## 1 CCT     368
## 2 CMN     356
## 3 ENG     318
## 4 JPN     186
## 5 KOR     352
## 6 SPA     384

However, it is also quite obvious that FastTrack is not free from errors. This is especially important for dynamic analysis, as we’ve found that there are some potential measurement errors.

4.1 Manually correcting measurement errors

FastTrack has a few ways to address tracking errors. First, it is possible to manually correct the formant tracking on Praat (but via FastTrack). I have personally never done this, but you can find more information about this here

4.2 Nominating different winners

An alternative approach, which I usually do, is to check the tracking accuracy of the rest of the analyses and see whether there is any ‘better’ analysis. Among the output files, we briefly talked about the images_comparison folder, where visualisations are stored for all 24 (or any other specified number of) analysis steps. In my experience (English /l/ and /r/), it is often the case that formant tracking was inaccurate when F3 is extremely low for /r/ and F2/F3 is high for a very clear /l/. Just eyeballing all the comparison images will help you evaluate the formant tracking accuracy fairly quickly (especially when you’re a Mac user where you can just preview all the image files by pressing the space bar.)

When you would like to nominate a different analysis as a winner, you can tell FastTrack to return the tracking results for the particular analysis. This can be done by modifying the winners.csv – all you need to do is to simply type in and indicate which analysis is better. You can replace the tracking for all formants at once, or change the tracking of just one formant. Either way, don’t forget to change the number in the Edit column from 0 to 1.

Once you have nominated a different winner by yourself, you need to run Track folder again. But this time, untick the Track formants and Autoselect winners boxes at the bottom, as you’re simply telling FastTrack to use different analysis instead of tracking formants all over again.

5 Session info

sessionInfo()
## R version 4.3.2 (2023-10-31)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS 15.3
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/London
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] emmeans_1.8.9   lmerTest_3.1-3  lme4_1.1-35.1   Matrix_1.6-1.1 
##  [5] emuR_2.4.2      knitr_1.45      lubridate_1.9.4 forcats_1.0.0  
##  [9] stringr_1.5.1   dplyr_1.1.4     purrr_1.0.2     readr_2.1.4    
## [13] tidyr_1.3.1     tibble_3.2.1    ggplot2_3.5.1   tidyverse_2.0.0
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6        xfun_0.50           bslib_0.7.0        
##  [4] lattice_0.21-9      numDeriv_2016.8-1.1 tzdb_0.4.0         
##  [7] vctrs_0.6.5         tools_4.3.2         generics_0.1.3     
## [10] pbkrtest_0.5.2      parallel_4.3.2      wrassp_1.0.4       
## [13] highr_0.10          pkgconfig_2.0.3     optimx_2023-10.21  
## [16] uuid_1.1-1          lifecycle_1.0.4     compiler_4.3.2     
## [19] farver_2.1.2        munsell_0.5.1       htmltools_0.5.8.1  
## [22] sass_0.4.9          yaml_2.3.8          pracma_2.4.4       
## [25] pillar_1.10.1       nloptr_2.0.3        crayon_1.5.2       
## [28] jquerylib_0.1.4     MASS_7.3-60         cachem_1.0.8       
## [31] boot_1.3-28.1       nlme_3.1-163        tidyselect_1.2.1   
## [34] digest_0.6.36       mvtnorm_1.2-5       stringi_1.8.4      
## [37] labeling_0.4.3      splines_4.3.2       fastmap_1.1.1      
## [40] grid_4.3.2          colorspace_2.1-1    cli_3.6.3          
## [43] magrittr_2.0.3      utf8_1.2.4          broom_1.0.5        
## [46] withr_3.0.2         backports_1.5.0     scales_1.3.0       
## [49] bit64_4.0.5         estimability_1.4.1  timechange_0.3.0   
## [52] rmarkdown_2.26      bit_4.0.5           png_0.1-8          
## [55] hms_1.1.3           coda_0.19-4.1       evaluate_0.23      
## [58] mgcv_1.9-0          rlang_1.1.4         Rcpp_1.0.14        
## [61] xtable_1.8-4        glue_1.8.0          DBI_1.1.3          
## [64] rstudioapi_0.15.0   vroom_1.6.5         minqa_1.2.6        
## [67] jsonlite_1.8.8      R6_2.5.1